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A  CONSTRUCTIVE  DEFINITION  OF  DIRICHLET  PRIORS 


By  Jayaram  Sethuraman 
Abstract 

The  “parameter”  in  a  Bayesian  nonparametric  problem  is  the  unknown  distribu¬ 
tion  P  of  the  observation  X .  A  Bayesian  uses  a  prior  distribution  for  P ,  and  after 
observing  X ,  solves  the  statistical  inference  problem  by  using  the  posterior  distribution 
of  P ,  which  is  the  conditional  distribution  of  P  given  X .  For  Bayesian  nonparametrics 
to  be  successful  one  needs  a  large  class  of  priors  for  which  posterior  distributions  can 
be  easily  calculated. 

Unless  X  takes  values  in  a  finite  space,  the  unknown  distribution  P  varies  in  an 
infinite  dimensional  space.  Thus  one  has  to  talk  about  measures  in  a  complicated  space 
like  the  space  of  all  probability  measures  on  a  large  space.  This  has  always  required  a 
more  careful  attention  to  the  attendant  measure  theoretic  problems. 

A  class  of  priors  known  as  Dirichlet  measures  have  been  used  for  the  distribution 
of  a  random  variable  X  when  it  takes  values  in  7 Ik,  see  Freedman  (1963),  Fabius  (1964) 
and  Ferguson  (1973).  This  family  forms  a  conjugate  family  and  possesses  many  pleasant 
properties. 

In  this  paper  we  give  a  simple  and  new  constructive  definition  of  Dirichlet  measures 
and  remove  the  restriction  that  the  basic  space  should  be  P-k-  We  give  complete  self 
contained  proofs  of  the  three  basic  results  for  Dirichlet  measures: 

1.  The  Dirichlet  measure  is  a  probability  measure  of  on  the  space  of  all  probability 

measures, 

2.  it  gives  probability  one  to  the  subset  of  discrete  probability  measures,  and 

3.  the  posterior  distribution  is  also  a  Dirichlet  measure. 
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1.  Introduction. 


Bayesian  nonparametrics  came  into  vogue  in  the  seventies.  Let  X  be  a  random 
variable  taking  values  in  a  measurable  space  (X,B)  and  let  its  unknown  probability 
measure  be  P.  The  “parameter”  in  a  Bayesian  nonparametrics  problem  is  the  un¬ 
known  probability  distribution  P.  If  X  is  not  a  finite  set,  this  parameter  takes  values 
in  an  infinite  dimensional  space,  and  hence  the  definition  of  a  prior  distribution  for 
P  has  always  required  a  more  careful  description  of  the  attendant  measure  theoretic 
problems.  A  practitioner  of  Bayesian  nonparametrics  puts  a  prior  distribution  for 
P  and  gives  his  answer  to  the  inference  problem  as  the  posterior  distribution  of  P 
given  X.  How  do  we  define  such  a  prior  distribution  and  calculate  the  posterior 
distribution?  Let  V  be  the  space  of  probability  measures  on  ( X ,  B)  and  note  that  P 
varies  in  V .  A  natural  cr-field  in  V  is  C,  the  smallest  a-field  generated  by  sets  of  the 
form  {P  :  P(P)  <  r}  where  B  varies  in  B  and  r  varies  in  [0,1].  A  nonparametric 
prior  for  a  probability  measure  P  is  then  a  probability  measure  u  on  {P,C}.  Let 
(P,X)  be  a  pair  of  random  variables  taking  values  in  {V  x  X,C  x  B]  such  that 
P  has  distribution  u  and  such  that  X  given  P  has  distribution  P.  The  posterior 
distribution  vx  is.  defined  to  be  the  distribution  of  P  given  X. 

Bayesian  nonparametrics  becomes  tractable  only  if  there  are  examples  of  priors 
v  for  which  vx  are  easy  to  calculate.  A  collection  of  prior  distributions  va  indexed 
by  a  parameter  a  is  said  to  form  a  conjugate  family  of  priors  if  the  posterior  dis¬ 
tribution  vx  is  of  the  form  vj(Q,x)  f°r  some  function  f(a,X)  of  a  and  X.  The 
class  of  Dirichlet  measures  form  a  conjugate  family  that  makes  it  useful  in  Bayesian 
nonparametrics . 

Before  giving  an  intuitive  definition  of  a  Dirichlet  measure  we  will  repeat 
the  well  known  definition  of  Dirichlet  measures  on  finite  dimensional  spaces.  Let 
(71, 72, . . .  ,7t)  be  a  vector  such  that  7 j  >  0,  j  =  1,2, ....  fc  and  such  that  ^  7 >  0. 
Let  z7j.  ,j  =  1,2 ,...,&  be  independent  Gamma  random  variables  with  scale  pa¬ 
rameter  1  and  shape  parameters  7 j,j  =  1,2 ,...,&,  respectively.  Let  z  =  r7;. 
and  yj  —  (zlj/z)^j  —  1,2,  The  joint  distribution  of  the  random  variable 

(yi,  2/2,  •••,!/*)  taking  values  in  Vk  =  {(pi,p2,  •  •  •  ,Pk)  ■  Pi  >  0  ,p2  >  0,...,pjt  > 
0,^Pj  =  1},  the  unit  simplex  of  P*,  is  defined  to  be  k-dimensional  Dirichlet  mea¬ 
sure,  P(7i,72,...,7*)-  Let  e;  denote  the  ^-dimensional  vector  consisting  of  0’s,  except 
for  the  jth  co-ordinate,  which  is  equal  to  1.  Notice  that  the  Dirichlet  measure  VGj 
puts  all  its  probability  mass  at  the  point  ej.  Further  more,  it  is  interesting  to  note 
that  T>2ej  —  Pe,  •  This  fact  will  use  used  later  in  the  proof  of  Theorem  4.3. 
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The  intuitive  definition  of  a  Dirichlet  measure  in  the  general  case  is  easy  to 
give.  Let  a  be  a  non-zero  element  of  M,  i.e.  let  a  be  a  non- zero  finite  measure 
on  (X,B).  A  probability  distribution  v  on  ( V,C )  is  said  to  be  a  Dirichlet  measure 
with  parameter  a  if  for  every  measurable  partition  {Bi,  B2, . . . ,  Bk}  of  X,  the  dis¬ 
tribution  of  (P(B  1),  P(B2), . . . ,  P(Bjt))  under  v  is  the  finite  dimensional  Dirichlet 
distribution  P(a(Bi),a(B2),...,a(B*))-  When  such  a  probability  measure  v  on  (P,C) 
can  be  demonstrated  to  exist,  it  will  be  denoted  by  T>a. 

There  are  three  main  properties  of  Dirichlet  measures  that  make  them  useful 
in  Bayesian  nonparametrics.  Apart  from  their  marginals  having  finite  '^mpr.sion'*1 
Dirichlet  distributions,  they  possess  the  following  three  properties: 

Pi  T>a  is  a  probability  measure  on  (V,C), 

P2  VQ  gives  probability  one  to  the  subset  of  all  discrete  probability  measures  on 
(X,B),  and 

P3  the  posterior  distribution  V %  is  the  Dirichlet  measure  Vq+Sx  where  6\  is  the 
probability  measure  degenerate  at  X.  This  paper  gives  a  constructive  definition 
of  a  Dirchlet  measure  and  shows  that  these  three  properties  hold. 

Ferguson  (1973)  argued  that  the  distributions  of  (P(B!),  P(B2), . . .  ,  P(Bk)) 
gave  rise  to  a  consistent  family  of  measures  over  the  class  of  all  partitions  (Bi? 
B2,...,Bjt).  By  the  Kolmogorov  consistency  theorem  this  gives  rise  to  a  unique 
probability  measure  on  [0, 1]B  with  its  associated  Kolmogorov  cr-field.  Further  more, 
for  any  given  sequence  of  disjoint  measurable  sets  Bi,  B2, . . .,  the  probability  is  one 
that 

P(UBi)  =  ^P(B,),  (1.1) 

where  P(-)  is  the  canonical  representation  of  a  point  in  [0,  l]e.  This  set  of  probability 
one  may  depend  on  the  sequence  B\ ,  B2 , . . .  Such  a  P  is  a  member  of  V  if  and  only 
if  (1.1)  were  true  for  all  disjoint  sequences  B\ ,  B2, . . .  The  collection  of  such  disjoint 
sequences  is  uncountable.  This  presents  a  problem  in  making  this  definition  rigorous 
and  establishing  property  Pi.  For  the  special  case  where  X  is  the  real  line,  or  more 
generally  a  separable  complete  metric  space,  one  can  use  a  result  of  Harris  (1968, 
Lemma  6.1).  This  result  states  that  a  verification  of  (1.1)  for  a  select  countable 
number  of  cases  of  disjoint  sequences  of  sets  is  sufficient  to  ensure  that  (1.1)  holds 
for  all  disjoint  countable  sets  and  that  the  set  function  P  is  a  probability  measure. 
An  appeal  to  this  result  is  one  way  to  show  that  there  is  a  probability  measure  on 
( P,C )  with  the  required  properties  and  this  defines  the  Dirichlet  measure  VQ. 

In  a  later  section,  Ferguson  ((1973),  Section  4)  gives  an  alternative  constructive 
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definition  of  the  Dirichlet  measure  which  shows  that  it  gives  probability  one  to  the 
subset  of  discrete  probability  measures.  However,  it  is  takes  some  effort  to  see  that 
that  the  two  definitions  are  equivalent. 

Ferguson  (1973)  also  establishes  the  posterior  distribution  property  P3  by  using 
a  very  peculiar  definition  (see  his  Definition  2)  for  the  joint  distribution  of  (P,  X). 

Blackwell  and  McQueen  (1973)  appeal  to  the  famous  theorem  of  de  Finetti  to 
show  that  there  is  a  one-to-one  correspondence  between  sequences  of  exchangeable 
random  variables  and  probability  measures  on  (V,  C ).  A  pa:  licular  case  of  exchange¬ 
able  random  variables,  namely  the  generalized  Polya  urn  scheme,  corresponds  to 
the  Dirichlet  measure.  In  this  paper  and  in  Blackwell  (1973),  they  establish  the 
three  properties  Pl,P2  and  P3.  Their  proof  is  elegant  but  quite  indirect  and  also 
requires  the  space  X  to  be  a  separable  complete  metric  space. 

Freedman  (1963)  and  Fabius  (1964)  contain  early  work  on  tail-free  priors,  which 
include  Dirichlet  priors,  for  the  case  when  X  is  the  set  of  integers  or  [0,1]. 

Let  £  be  the  usual  Borel  a-field  restricted  to  [0,1].  In  Section  2,  we  define 
a  function  P  based  on  a  sequence  of  i.i.d.  random  variables  ( 0n,Yn),n  =  1,2,... 
taking  values  in  ([0, 1]  x  X,  £  x  B).  See  (2.1).  By  its  very  definition,  P  is  a  random 
measure  taking  values  in  (P,C)  and  giving  probability  one  to  the  subset  of  discrete 
probability  measures  on  (X,B).  This  establishes  properties  Pi  and  P2.  We  give 
a  direct  proof,  in  Theorem  3.4  of  Section  3,  that  the  finite  dimensional  marginal 
distributions  of  P  axe  Dirichlet  distributions.  This  establishes  that  the  distribution 
of  P  is  a  Dirichlet  measure.  In  Theorem  4.3  of  Section  4  we  prove  property  P3 
thus  establishing  that  the  posterior  distribution  is  also  a  Dirichlet  measure.  The 
definition  and  proofs  are  all  given  in  some  detail  to  make  this  paper  self  contained. 

This  constructive  definition  of  a  Dirichlet  measure  was  announced  in  a  paper  on 
convergence  of  Dirichlet  measures,  Sethuraman  and  Tiwari  (1982).  This  definition 
has  since  been  used  by  several  authors  to  greatly  simplify  previous  calculations  and 
to  obtain  new  calculations  involving  Dirichlet  measures.  For  instance  see  Ferguson 
(1983),  Ferguson,  Phadia  and  Tiwari  (1991),  Kumar  and  Tiwari  (1989). 

2.  Constructive  definition  of  the  Dirichlet  measure 

Let  a  be  a  non-zero  finite  measure  on  {X,B}.  Let  0(B)  —  a(B)/a(X)  be  the 
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normalized  probability  measure  arising  from  a.  Let  B(j,6)  stand  for  the  Beta  dis¬ 
tribution  on  [0, 1]  with  parameters  7  and  S.  This  Beta  distribution  is  the  marginal 
distribution  of  the  first  co-ordinate  of  the  Dirichlet  measure  T>{ 7, 6)  on  the  two- 
dimensional  simplex  V2  defined  earlier.  Let  M  =  {1,2, . . .}  be  the  set  of  positive 
integers  and  let  J-  be  the  <7-field  of  all  subsets  of  M.  Let  {fi,<S,Q}  be  a  proba¬ 
bility  space  supporting  a  collection  of  random  variables  (0,  Y , /)  =  ((6j,  Yj),j  = 
1,2,...,/)  talcing  values  in  (([0,1]  x  X)°°  x  M,  (S  x  B)°°  x  T),  with  a  joint  dis¬ 
tribution  defined  as  follows.  The  random  variables  (0i,02,...)  are  i.i.d.  with  a 
common  Beta  distribution  B(l,a(X)).  The  random  variables  (Yi ,  Y2i  •  •  •)  are  in¬ 
dependent  of  the  (61,62,...)  and  i.i.d.  among  themselves  with  common  distribu¬ 
tion  j3.  Let  pi  =  0i  and  for  pn  =  6n  rii<m<n-i(1  “  °rn)  for  n  =  2,3,...  Nr 
tice  that  X)i<n<„Pm  =  1  -  IIi<m<n(1  ~  0m )  -*•  1  with  Q-probability  one.  Let 
Q(I  =  n|(0,  Y))  =  pn,n  =  1,2,....  The  existence  of  a  probability  space  (fl,S,Q) 
and  such  a  sequence  of  random  variables  (6,  Y,/)  follows  from  the  usual  construc¬ 
tion  of  a  product  measure,  and  does  not  require  any  restrictions  on  (X,B),  such  as 
its  being  a  separable  complete  metric  space. 

Define 

OO 

P(9.Y;fl)  =  P(B)  =  ^pn<v,,(B)  (2.1) 

n=l 

where  Sx(-)  stands  for  the  probability  measure  degenerate  at  x. 

This  is  the  new  constructive  definition  of  a  Dirichlet  measure.  As  convenience 
dictates,  we  drop  all  or  part  of  the  arguments  (6,Y),B  and  denote  the  random 
measure  in  (2.1)  by  P ,  for  simplicity  of  notation.  Since  P  is  clearly  a  measurable 
map  from  (D,«S)  into  (V,C)  and  takes  values  in  the  subset  of  discrete  probability 
measures,  properties  Pi  and  P2  are  self  evident. 

Notice  that  the  random  variable  I  introduced  above  has  not  been  used  in  the 
definition  of  P.  It  will  be  used  later,  in  Section  4,  to  prove  the  posterior  distribution 
property  P3. 

A  more  direct  way  to  describe  the  constructive  definition  in  (2.1)  is  as  fol¬ 
lows.  Let  Yi,F2,...  be  i.i.d.  with  common  distribution  (3.  Let  {pi,p2,...}  be 
the  probabilities  from  a  discrete  distribution  on  the  integers  with  discrete  failure 
rate  {0i,02,...}  which  are  i.i.d.  with  a  Beta  distribution  B(l,o(A’)).  Let  P  be  the 
random  probability  measure  that  puts  weights  pn  at  the  degenerate  measures  6yn , 
n  =  1,2, . . .  This  is  the  random  probability  measure  P  described  in  (2.1).  The  alter- 
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native  definition  given  in  Ferguson  ((1973),  Section  4)  uses  a  different  set  of  random 
weights  which  are  arranged  in  decreasing  order.  The  use  of  unordered  weights  in 
this  paper  simplifies  all  our  calculations.  It  is  interesting  to  note  that  the  weights 
used  by  Ferguson  (1973)  are  equivalent  to  our  weights  rearranged  in  decreasing  or¬ 
der.  However,  it  is  not  clear  that  there  is  am  easy  way  to  unorder  the  weights  of 
Ferguson  (1973)  to  obtain  weights  with  the  simple  structure  of  (2.1). 

3.  The  distribution  of  the  random  measure  P  is  T>Q. 

We  will  digress  a  little  before  establishing  that  the  distribution  of  P  is  the 
Dirichlet  measure  T>a. 

Let  6*n  =  en+uY*  =Yn+un  =  1,2,...  and  let  J  =  1-1.  Define  (0*,Y*,  J)  = 

((»; 

Notice  that 


P(t >,  Y;  B)  =  e1Sy,(B)  +  (1  -  O1)P(0*,Y';B).  (3.1) 


Notice  that  (0*,Y*)  has  the  same  distribution  as  (0,  Y)  and  is  independent  of 
(01,  li).  Thus  we  can  re-write  (3.1)  as  the  following  distributional  equation  for  P: 

P  =  01SYi+(l-01)P,  (3.2) 


where  on  the  right  hand  side  P  is  independent  of  (0i,Yi). 

Theorem  3.4  below  uses  the  distributional  equation  (3.2)  to  show  that  the  dis¬ 
tribution  of  P  is  the  Dirichlet  measure  Va.  The  proof  of  this  theorem  uses  well 
known  facts  about  finite  dimensional  Dirichlet  measures  and  a  result  on  the  unique¬ 
ness  of  solutions  to  distributional  equations,  which  are  given  below  as  Lemmas  3.1, 
3.2  lnd  3.3. 

Lemma  3.1  Let  7  =  (71,72,  •••  ,7*)  and  6  =  (di ,  62, . . . ,  £*)  be  k-dimensional 
vectors.  Let  U,V  be  independent  k-dimensional  random  vectors  with  Dirichlet  dis¬ 
tributions  P7  and  X>£,  respectively.  Let  W  be  independent  of  (£/,  T)  and  have  a 
Beta  distribution  B(7,6),  where  7  =  ^7 y  and  6  =  Sj.  Then  the  distribution  of 
WU  +  (1  —  W)V  is  the  Dirichlet  distribution 
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Lemma  ^•2 

Then 


Let  7  =  (7j,...,7j t),  7  =  and  let  =  7j  /  7  > j  =  l»2,...,fc. 
~y~!  ftj'Dy+ej  ~  'D-r 


The  proofs  of  these  two  lemmas  are  found  in  many  standard  text  books,  for 
instance  in  Wilks  ((1962),  Section  7). 

Lemma  3.3  stated  and  proved  below  shows  that  certain  distributional  equations 
have  unique  solutions.  Such  results  appear  in  several  areas  of  statistics,  notably  in 
renewal  theory.  For  a  recent  work  which  gives  more  general  results  see  Goldie  (1991). 
The  following  lemma  is  sufficient  for  our  purposes.  Its  proof,  which  is  not  new,  is 
given  here  to  make  this  paper  self  contained. 

Lemma  3.3  Let  W,  U ,  V  be  random  variables  where  W  is  a  real  valued  and  U,  V 
take  values  in  a  linear  space.  Suppose  that  V  is  independent  of  (W,  U)  and  satisfies 
the  distributional  equation 

V  =  U  +  WV.  (3.3) 

Suppose  that  P(W  =  1)  ^  1.  Then  there  is  only  one  distribution  for  V  that  satisfies 
(S.S). 


Proof:  Let  V  and  V1  be  two  random  variables  whose  distributions  are  not  equal 
but  satisfy  equation  (3.3).  Let  ( Wn ,  Un)  be  independent  copies  of  (W,  U)  which  are 
independent  of  V,  V'.  Let  V\  =  V,  Vf  =  V1  and  define,  recursively, 

Fn+1  =  Un  +  WnVn  and  Vf+1  =  Un  +  WnVf 

for  n  =  1,2,...  From  the  distributional  equation  (3.3),  the  Pn’s  have  the  same 
distribution  as  V  and  the  V^’s  have  the  same  distribution  as  V' .  However, 

|V.+1-K+1|  =  |W»||V.-KI=  n 

1  <m<n 

with  probability  1,  since  the  Wn’s  are  i.i.d.  and  P(W  =  1)  <  1.  This  contradicts 
the  supposition  that  the  distributions  of  V  and  V'  are  unequal  and  proves  that  the 
distribution  of  V  satisfying  (3.3)  is  unique.  0 

Theorem  3.4  Let  . . .  ,Bk]  be  a  measurable  partition  of  X  and  let  P  = 

(P(Bi ),  P(B2), . .  • ,  P(Bk))-  Then  the  distribution  of  P  is  the  k-dimensional  Dirich- 
let  measure 
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Proof:  Let  D  =  (6y,  (I?i), ^,(.82), . . . , 6y,  (B*)).  Notice  that  P(D  =  e;)  = 

P{Y\  6  Bj )  =  /3(Bj),j  =  1,2 From  (3.2)  we  see  that  P  satisfies  the  distri¬ 
butional  equation 

P^D  +  a-tfOP,  (3.4) 

where,  on  the  right,  6\  has  a  Beta  distribution  B(l,  a(A’)),  D  is  independent  of  6\ 
and  takes  the  value  ey  with  probability  fi(Bj),j  =  1,2 and  the  ^-dimensional 
random  vector  P  is  independent  of  (0j,D). 

We  will  first  verify  that  the  ^-dimensional  Dirichlet  measure  for  P  satisfies  the 
distributional  equation  (3.4)  and  then  show  that  this  solution  is  the  unique  solution. 

Let  the  distribution  of  P  on  the  right  of  (3.4)  be  the  ^-dimensional  Dirich¬ 
let  measure  £>(a(Bi),a(B2),...,a(B*))-  The  fc-dimensional  Dirichlet  measure  T>ej  gives 
probability  1  to  ey.  Given  that  D  =  ey,  the  distribution  of  +  (1  —  #i)P  is 
the  distribution  of  6iT>ej  +  (1  —  Qi)'D(Q(B1),a(BJ),...,a(Bk))  and  this,  by  Lemma  3.1, 
is  'D(Q(B1),a(B2),...,o(B*))+eJ  •  Summing  over  the  distribution  of  D  is  equivalent  to 
taking  a  mixture  of  these  Dirichlet  measures  with  weights  P{Bj)  =  a(Bj)/a(X), 
which  by  Lemma  3.2,  is  equal  to  ■£,(a(Bi))a(B2),...,«(Bk))-  This  verifies  that  the  k- 
dimensional  Dirichlet  measure  satisfies  the  distributional  equation  (3.4).  Lemma 
3.3  shows  that  this  solution  is  unique.  This  completes  the  proof  of  Theorem  3.4.  0 

4.  The  posterior  distribution  of  P  is  Va+6X  • 

Let  X  =  Fj.  Then  X  is  a  random  variable  from  (0,<S)  into  X  defined  explicitly 
as  a  function  of  (P,  Y,I).  The  next  lemma  shows  that  the  distribution  of  A'  given 
P  is  P  and  hence  the  joint  distribution  of  (P,  A”)  is  that  of  the  parameter  and 
observation  in  a  Bayesian  nonparametric  problem. 

Lemma  4.1  The  distribution  of  X  given  P  is  P. 

Proof:  Let  B  €  B.  By  direct  calculation,  we  get 

Q(X  €  B|(0,Y))  =  £<?(*  g  B,I  =  n\(»,Y))Q(I  =  n|(«,Y)) 

n 

=  £Q(Y„  €  B|(«,Y))p„ 

n 

=  =  P(B). 
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Since  this  conditional  probability  is  a  function  of  P,  it  immediately  follows  that 
Q(-\P)  exists  as  a  regular  conditional  probability  and  Q( X  £  B\P)  —  P(B)  with 
(^-probability  1.  0 

We  now  come  to  the  posterior  distribution  of  P,  i.e.  the  distribution  of  P 
given  X.  We  do  this  by  separately  obtaining  the  conditional  distribution  of  (0,  Y) 
given  7=1  and  given  I  >  1.  When  /  and  g  are  functions  of  (0,  Y,  J),  we  will  use 
the  notations  £(/)  and  C(f\g)  to  denote  the  distribution  of  /  and  the  conditional 
distribution  of  /  given  g ,  under  Q,  respectively. 

Lemma  4.2  The  following  are  the  conditional  distributions  of  (8,Y ,  I)  given  7  =  1 
and  given  I  >  1: 

£((0i,*i),(**,Y*)|J  =  1)  =  B(2,a(X))  x  £(0, Y)  (4.1) 

and 

CWuYW'Y'^JlI  >  1)  =  B(l,a(X)  +  1)  x  £(0,Y,7).  (4.2) 


Proof;  Notice  that  Q(I  =  1|(0,  Y))  =  01.  Thus,  if  A{  €  £,  P,  £  B,  i  =  1, 2, . . . ,  n, 
we  have  the  relation 

Q{0i  e  4,-,y,  £  Bi,i  =  1,2,  ...,n,7=  1} 

oc  I  I(xi  £  £  Bj,t  =  1,2,.  ..,n)  ii  [(1  -  x,)Q(A')-1dxI/?((fy1)]. 

This  implies,  conditional  on  7  =  1,  has  distribution  P(2,o(A’)),  the  distributions 
of  0,,  i  =  2, 3, . . .  ,n  and  Y),t  =  1,2,  ...,n  are  all  unchanged,  and  all  these  are 
independent.  This  gives  all  the  finite  dimensional  conditional  distributions  and 
proves  (4.1).  The  proof  of  (4.2)  follows  along  the  same  lines  since  Q(I  >  1|(0,  Y))  = 
l-*i.  0 

Theorem  4.3  The  posterior  distribution  of  P  given  X  is  the  Dirichlet  measure 
■ 

Proof:  Let  P*  =  P(0*,  Y*).  We  can  rewrite  (3.1)  as 


p  =  e1sYl  +(i-0,)p*. 


(4.3) 
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When  I  =  1,  we  use  (4.1)  and  obtain 


C(P\X,I  =  1)  =  £(#i<5y,  +  (1  -  8\  )P* \X,  I  =  1) 

^  e'.Sx  +  (\  -  e[)P^  (4.4) 

where  8'  has  distribution  B(2,a(X)),  and  P’*  is  a  random  probability  measure, 
independent  of  6' ,  whose  distribution  is  the  Dirichlet  measure  T>a.  The  random 
probability  measure  putting  all  its  mass  on  the  degenerate  measure  8\  is  the  Dirich¬ 
let  measure  T>f,x  which  is  also  equal  to  T>2SX-  Since  8'  has  a  Beta  distribution 
B( 2,a(A’)),  this  latter  choice  allows  us  to  use  Lemma  3.1  to  obtain 

C(P\X,I  =  1)  =  V0+26x-  (4.5) 

When  I  >  1,  we  use  (4.2)  and  first  obtain 

C(8* ,Y* ,X\I  >  1)  =  £(8,Y,X)  (4.6) 

since  X  =  Yj  —  Yj  on  I  >  1.  Thus 

C(P\X,I>  l)  =  C(816Yl  +  (1  -0i  )P*|A',  /  >  1) 

=  0"SYl  +(1  -0")P*"  (4.7) 

where  Y\  has  distribution  /?,  8"  is  independent  of  Y\  and  has  distribution 
B(l,a(A')  +  1),  and  P***  is  a  random  probability  measure,  independent  of  (Y\.8"). 
whose  distribution  is  £(P|A"),  in  view  of  (4.6).  We  can  combine  (4.4)  and  (4.7)  to 
obtain  a  distributional  equation  for  £(P| A")  as  follows. 

£(P\X)  =  A(8\ 8\  +  (1  -  8[)Pmn  +  (1  -  A)(9"6Yl  +  (1  -  8?)P—),  (4.S) 

where  all  the  random  variables  on  the  right  are  independent  and  have  the  distri¬ 
butions  previously  specified,  and  the  random  variable  A  takes  values  1  and  0  with 
probabilities  ^  and  respectively.  Notice  that  the  distribution  of 

jP***  is  £(P|A')  which  makes  (4.8)  a  distributional  equation. 

From  Lemma  3.3  we  conclude  that  if  there  is  a  solution  to  (4.S),  it  will  be  a 
unique  solution.  We  will  now  verify  that  £(P|A')  =  VQ+f,x  verifies  the  distributional 
equation  (4.8).  Relation  (4.5)  can  be  rewritten  as 

8[6X  +  (1-0[)P”  S=V„+2(,X.  (4.9) 
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By  conditioning  on  Y\  and  using  Lemma  3.1,  and  then  talcing  expectations  with 
respect  to  Yj ,  we  find  that 

+  (1  -  W"  =  E(Va+6x+SYl)t  (4.10) 

where  Yj  has  distribution  0.  Let  Z  be  a  random  variable  in  (X ,  B)  with  distribution 
(a(A')+i/*  +  (a^T+i)^  =  Combining  (4.9)  and  (4.10),  and  using  Lemma 

3.2  on  mixtures  of  Dirichlet  measures,  we  conclude  the  distribution  of  the  random 
measure  in  the  right  hand  side  of  (4.8)  is  equal  to 

(<*(X)  +  1)Vq+26x  +  (a(i)  +  l)E(Po+6A'+6v>)  =  E(V°+6x+6z) 

—  T) 

—  '-'a+Sx  ■ 

This  proves  that  Va^.f,x  is  the  posterior  distribution  of  P  given  X.  <> 
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